Skip to content

fix: metric loss when a previously expired metric is re-added#11

Merged
membphis merged 7 commits intoapi7:mainfrom
shreemaan-abhishek:fix-metric-miss
Mar 2, 2026
Merged

fix: metric loss when a previously expired metric is re-added#11
membphis merged 7 commits intoapi7:mainfrom
shreemaan-abhishek:fix-metric-miss

Conversation

@shreemaan-abhishek
Copy link

@shreemaan-abhishek shreemaan-abhishek commented Feb 25, 2026

Description

A worker-local cache (self.lookup and self.index) was used to optimize metric lookups. However, because ngx.shared.dict automatically evicts keys upon TTL expiration but the local Lua memory cache is not automatically synced to reflect this eviction, a drift occurs:

A metric key expires and is removed from the global ngx.shared.dict.
A worker still holds a reference to this metric in its local self.index cache.
When the worker tries to increment/update this metric, KeyIndex:add() sees the metric in self.index and skips adding it to the shared dictionary.
As a result, the metric is completely lost from the KeyIndex:list() scrape cycle, causing missing metrics in Prometheus.

Solution:

  • When a worker looks up a metric to increment/update it, if it has an expiration time set (self.exptime), we ensure the key index relationship is updated via self._key_index:add to put it back into the list.

  • regularly sweep the local index and clear out references to keys that have TTL-expired from the global dictionary. This prevents memory leaks in the Lua process and ensures that KeyIndex:list() doesn't iterate over thousands of dead keys.

lgang06 and others added 4 commits April 5, 2024 15:06
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
… the index

Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
@shreemaan-abhishek
Copy link
Author

@coderabbitai pls review

f
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
Signed-off-by: Abhishek Choudhary <shreemaan.abhishek@gmail.com>
@membphis
Copy link

membphis commented Feb 27, 2026

@shreemaan-abhishek why do we need this PR? can you tell me more about why we do this?

seems relate to apache/apisix#11113

@shreemaan-abhishek
Copy link
Author

@membphis yes, you are right.

@membphis membphis merged commit 408f825 into api7:main Mar 2, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants